Print a metrics comparison matrix

In order to make a first evaluation of the given datasets, we compute some basic metrics.

For more information on the metrics and also the extraciton of metrics for the smaller datasets look at:

`Evaluation metrics for picking an appropriate data set for our goals.ipynb `

For the importing the four largest datasets to postgresql and evaluating their metrics look at:

`Importing the large data sets to psql and computing their metrics.ipynb`

Finally, the evaluated metrics of all datasets are exported to metadata and imported here to visualize.



In [1]:

    
def percentage(some_float):
    return '%i%%' % int(100 * some_float)

def metrics_comparison_matrix(reviews_df):
    return reviews_df.apply(
        lambda row: 
            [ percentage(row[i]) for i in range(0, 5) ] 
            + [ int(row[5]), row[6], row[7] ], 
        axis=1)



In [2]:

    
import pandas as pd

small_data_metrics = pd.read_csv('./metadata/initial-data-evaluation-metrics.csv')
large_data_metrics = pd.read_csv('./metadata/large-datasets-evaluation-metrics.csv')



In [3]:

    
metrics = metrics_comparison_matrix(
    pd.concat([ small_data_metrics, large_data_metrics ])
        .set_index('dataset_name'))



In [5]:

    
metrics.to_csv('./metadata/all-metrics-formatted.csv')
metrics









    Out[5]:







  
    
      
      1
      2
      3
      4
      5
      number_of_reviews
      reviews_per_product
      reviews_per_reviewer
    
    
      dataset_name
      
      
      
      
      
      
      
      
    
  
  
    
      Amazon Instant Video
      4%
      5%
      11%
      22%
      56%
      37126
      22.033234
      7.237037
    
    
      Apps for Android
      10%
      5%
      11%
      20%
      51%
      752937
      57.001817
      8.627574
    
    
      Automotive
      2%
      2%
      6%
      19%
      68%
      20473
      11.156948
      6.992145
    
    
      Baby
      4%
      5%
      10%
      20%
      58%
      160792
      22.807376
      8.269067
    
    
      Beauty
      5%
      5%
      11%
      20%
      57%
      198502
      16.403768
      8.876358
    
    
      Cell Phones and Accessories
      6%
      5%
      11%
      20%
      55%
      194439
      18.644069
      6.974389
    
    
      Clothing Shoes and Jewelry
      4%
      5%
      10%
      20%
      58%
      278677
      12.099032
      7.075355
    
    
      Digital Music
      4%
      4%
      10%
      25%
      54%
      64706
      18.135090
      11.677676
    
    
      Grocery and Gourmet Food
      3%
      5%
      11%
      21%
      57%
      151254
      17.359578
      10.302704
    
    
      Health and Personal Care
      4%
      4%
      9%
      19%
      61%
      346355
      18.687547
      8.970836
    
    
      Home and Kitchen
      4%
      4%
      8%
      19%
      63%
      551682
      19.537557
      8.293600
    
    
      Kindle Store
      2%
      3%
      9%
      25%
      58%
      982619
      15.865583
      14.403046
    
    
      Office Products
      2%
      3%
      9%
      28%
      56%
      53258
      22.007438
      10.857900
    
    
      Patio Lawn and Garden
      3%
      5%
      12%
      25%
      53%
      13272
      13.796258
      7.871886
    
    
      Pet Supplies
      5%
      5%
      10%
      17%
      60%
      157836
      18.547121
      7.949033
    
    
      Sports and Outdoors
      3%
      3%
      8%
      21%
      63%
      296337
      16.142997
      8.324541
    
    
      Tools and Home Improvement
      3%
      3%
      8%
      21%
      63%
      134476
      13.161985
      8.082462
    
    
      Toys and Games
      2%
      3%
      9%
      22%
      61%
      167597
      14.055434
      8.633680
    
    
      Video Games
      6%
      5%
      12%
      23%
      51%
      231780
      21.718516
      9.537094
    
    
      Books
      3%
      4%
      10%
      24%
      55%
      8898040
      24.180639
      14.739956
    
    
      CDs and Vinyl
      4%
      4%
      9%
      22%
      59%
      1097592
      17.031982
      14.584390
    
    
      Electronics
      6%
      4%
      8%
      20%
      59%
      1689188
      26.812082
      8.779427
    
    
      Movies and TV
      6%
      6%
      11%
      22%
      53%
      1697533
      33.915388
      13.694200

	1	2	3	4	5	number_of_reviews	reviews_per_product	reviews_per_reviewer
dataset_name
Amazon Instant Video	4%	5%	11%	22%	56%	37126	22.033234	7.237037
Apps for Android	10%	5%	11%	20%	51%	752937	57.001817	8.627574
Automotive	2%	2%	6%	19%	68%	20473	11.156948	6.992145
Baby	4%	5%	10%	20%	58%	160792	22.807376	8.269067
Beauty	5%	5%	11%	20%	57%	198502	16.403768	8.876358
Cell Phones and Accessories	6%	5%	11%	20%	55%	194439	18.644069	6.974389
Clothing Shoes and Jewelry	4%	5%	10%	20%	58%	278677	12.099032	7.075355
Digital Music	4%	4%	10%	25%	54%	64706	18.135090	11.677676
Grocery and Gourmet Food	3%	5%	11%	21%	57%	151254	17.359578	10.302704
Health and Personal Care	4%	4%	9%	19%	61%	346355	18.687547	8.970836
Home and Kitchen	4%	4%	8%	19%	63%	551682	19.537557	8.293600
Kindle Store	2%	3%	9%	25%	58%	982619	15.865583	14.403046
Office Products	2%	3%	9%	28%	56%	53258	22.007438	10.857900
Patio Lawn and Garden	3%	5%	12%	25%	53%	13272	13.796258	7.871886
Pet Supplies	5%	5%	10%	17%	60%	157836	18.547121	7.949033
Sports and Outdoors	3%	3%	8%	21%	63%	296337	16.142997	8.324541
Tools and Home Improvement	3%	3%	8%	21%	63%	134476	13.161985	8.082462
Toys and Games	2%	3%	9%	22%	61%	167597	14.055434	8.633680
Video Games	6%	5%	12%	23%	51%	231780	21.718516	9.537094
Books	3%	4%	10%	24%	55%	8898040	24.180639	14.739956
CDs and Vinyl	4%	4%	9%	22%	59%	1097592	17.031982	14.584390
Electronics	6%	4%	8%	20%	59%	1689188	26.812082	8.779427
Movies and TV	6%	6%	11%	22%	53%	1697533	33.915388	13.694200